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Abstract 

A birthday surprise is the event that, given k uniformly random samples from a 
sample space of size n, at least two of them are identical. We show that Bernoulli 
numbers can be used to derive arbitrarily exact bounds on the probability of a 
birthday surprise. This result can be used in arbitrary precision calculators, and it 
can be applied to better understand some questions in communication security and 
pseudorandom number generation. 
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1 Introduction 



In this note we address the probability (3^ that in a sample of k uniformly 
random elements out of a space of size n there exist at least two identical 
elements. This problem has a long history and a wide range of applications. 
The term birthday surprise for a collision of (at least) two elements in the 
sample comes from the case n = 365, where the problem can be stated as 
follows: Assuming that the birthday of people distributes uniformly over the 
year, what is the probability that in a class of k students, at least two have 
the same birthday? 
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It is clear (and well known) that the expected number of collisions (or birth- 
days) in a sample of k out of n is: 

k\ 1 _ k{k-l) 
2)n~ 2n ' 

(Indeed, for each distinct i and j in the range {1, . . . , /c}, let Xij be the ran- 
dom variable taking the value 1 if samples i and j obtained the same value 
and otherwise. Then the expected number of collisions is EiJ^i^^j ^ij) = 

Ei^j E{Xij) = ^ = (2)^-) 

Thus, 28 students are enough to make the expected number of common birth- 
days greater than 1. This seemingly surprising phenomenon has got the name 
birthday surprise, or birthday paradox. 

In several applications, it is desirable to have exact bounds on the probability 
of a collision. For example, if some electronic application chooses pseudoran- 
dom numbers as passwords for its users, it may be a bad surprise if two users 
get the same password by coincidence. It is this term "by coincidence" that 
we wish to make precise. 



2 Bounding the probability of a birthday surprise 



When k and n are relatively small, it is a manner of simple calculation to 
determine (3^. The probability that all samples are distinct is: 
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and = 1 — 71^. For example, one can check directly that /Jflg > 1/2, that 
is, in a class of 23 students the probability that two share the same birthday 
is greater than 1/2. This is another variant of the Birthday surprisel^ 

The calculation becomes problematic when k and n are large, both due to pre- 
cision problems and computational complexity (in cryptographic applications 
k may be of the order of trillions, i.e., thousands of billions). This problem 
can be overcome by considering the logarithm of the product: 

ln«,^|ln(l-l). 



To experience this phenomenon experimentally, the reader is referred to [8]. 
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Since each i is smaller than n, we can use the Taylor expansion ln(l — x) 
-E~=ia;"'/m < 1) to get that 



fc— 1 oo / • I oo 1 k—1 



HO = E E ^ = E -Le^™- (2) 



i=lrn=l m ^iTl"^^' i=l 



(Changing the order of summation is possible because the sums involve posi- 
tive coefficients.) 

The coefficients p{k — l,m) := Y^^Zi^^ (which are often called sums of pow- 
ers, or simply power sums) play a key role in our estimation of the birth- 
day probability. Efficient calculations of the ffist few power sums go back 
to ancient mathematics!^ In particular, we have: p(/c, 1) = k{k + l)/2, and 
p(/c, 2) = k{k + l){2k + l) /Q. Higher order power sums can be found recursively 
using Bernoulli numbers. 

The Bernoulli numbers (which are indexed by superscripts) 1 = , B^, B^, 
B^, B'^,. . . are defined by the formal equation "i?" = (5 — 1)"" for n > 1, 
where the quotation marks indicate that the involved terms are to be expanded 
in formal powers of B before interpreting. Thus: 

• B"^ = B"^ - 2B^ + 1, whence B^ = 1/2, 

• = B'^ - 3B^ + 3B^ - 1, whence B^ = 1/6, 

etc. We thus get that B^ = 0, 5^ = -1/30, B^ = 0, 5^ = 1/42, 5^ = 0, and 
so on. It follows that for each m, 

p{k,m) = ^ 

m + 1 

[Faulhaber's formula [6J.) Thus, the coefficients p(/c, m) can be efficiently cal- 
culated for small values of m. In particular, we get that 

. p{k,3) = \k' + lk' + \k\ 

• p(fc,4) = iA;5 + iA;4 + ifc3-3Lfc, 

• p(^) 5) = gk^ ~\~ ~l~ 12^ 12^ ' 

• p(fc, 6) = + + _ 1^3 + 

• p(fc, 7) = gk^ ~\~ 2^"^ ~l~ 12^^ 24^ ~^ 12^ ' 

etc. In order to show that this is enough, we need to bound the tail of the 
series in Equation [2l We will achieve this by effectively bounding the power 
sums. 



^ Archimedes (ca. 287-212 BCE) provided a geometrical derivation of a "formula" 
for the sum of squares [9j . 
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Lemma 1 Let k be any natural number, and assume that f : (0, k) M"*" is 
such that f"{x) exists, and is nonnegative for all x G (0, k). Then: 




Proof. For each interval + = 0, . . . , A; — 1), the tangent to the graph 
of /(x + |) aX. X = i + \ goes below the graph of f{x + |). This implies that 
the area of the added part is greater than that of the uncovered part. □ 



Using Lemma [H we have that for all m > 1, 

k-l _ l\m+l 



i=l 

Thus, 



5^ i"^ < J {x+ D^c/x < 



m + 1 



^ P(fc-l,m) ^ ^ jk - 1)-+^ ^ k-l - (k-iy 



We thus have the following. 

Theorem 2 Let tt^ denote the probability that all elements in a sample of k 
elements out of n are distinct. For a natural number N, define 



N{N + l) 



E < - in(.;) < E + -iXN). 

m=l m=l 

For example, for = 2 we get: 



n 
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We demonstrate the tightness of these bounds with a few concrete examples: 



Example 3 Let us bound the probability that in a class of 5 students there 
exist two sharing the same birthday. Using Theorem [2] with N = 2 we get 
by simple calculation that ^ < — ln(7r|g5) < ^ + 2105320 ' numericallyH] 
0.0273972 < - ln(7r|65) < 0.0275127. Thus, 0.0270253 < Pl^r^ < 0.0271377. 
Repeating the calculations with = 3 yields 0.0271349 < /Jfgs < 0.0271356. 
iV = 4 shows that /^fgs = 0.0271355 . . .. 

Example 4 We bound the probability that in a class of 73 students there exist 
two sharing the same birthday, using = 2: ^ < - ln(7r||5) < ^ + iHfH, 
and numerically we get that 0.9992534 < < 0.9995882. For = 3 we get 
0.9995365 < < 0.9995631, and for iV = 8 we get that = 0.9995608 . . .. 

In Theorem [21 e^(iV) converges to exponentially fast with A^. In fact, the 
upper bound is a very good approximation to the actual probability, as can 
be seen in the above examples. The reason for this is the effectiveness of the 
bound in Lemma [T] (see [1] for an analysis of this bound as an approximation). 



For k < y/n, we can bound directly: Note that for |a;| < 1 and odd M, 

Corollary 5 Let P^ denote the probability of a birthday surprise in a sample 
of k out of n, and let lN{k,n) and UN{k,n) be the lower and upper bounds 
from Theorem 0, respectively. Then for all odd M , 

^^ {-l^{k,n)r ^ ^ {-UM{k,n)r 



m=l ^1 



For example, when M = 1 we get that 

- l)k _ (k - l)V , (k-l)k (k - 1)3 



The explicit bounds become more complicated when M > 1, but once the 
lower and upper bounds in Theorem [2] are computed numerically, bounding 
P^ using Corollary [5] is easy. However, Corollary [5] is not really needed in order 
to deduce the bounds - these can be calculated directly from the bounds of 
Theorem [21 e.g. using the exponential function built in calculators. 



^ All calculations in this paper were performed using the GNU be calculator ^S], 
with a scale of 500 digits. 
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Remark 6 (1) It can be proved directly that in fact jd^ < ^^^^ [2]. How- 
ever, it is not clear how to extend the direct argument to get tighter 
bounds in a straightforward manner. 

(2) Our lower bound in Equation H] compares favorably with the lower bound 
[l — '^^~2n^ from ^ when k < ^J^/nfe (when k > ^2n/e we need to 
take larger values of M to get a better approximation). 

(3) p{k — l,m) is bounded from below by {k — l)'^+^/(m + 1). This implies 
a slight improvement on Theorem [2l 



3 Some applications 

3.1 Arbitrary percision calculators 

Arbitrary percision calculators do calculations to any desired level of accuracy. 
Well-known examples are the be and GNU be [5] calculators. Theorem [2] allows 
calculting to any desired level of accuracy (in this case, the parameter N 
will be determined by the required level of accuracy), and in practical time. 
An example of such calculation appears below (Example [7]). 



3.2 Cryptography 

The probability of a birthday surprise plays an important role in the security 
analysis of various cryptographic systems. For this purpose, it is common to 
use the approximation (3^ ^ k'^/2n. However, in concrete security analysis it is 
prefered to have exact bounds rather than estimations (see [1] and references 
therein) . 

The second item of Remark [6] implies that security bounds derived using earlier 
methods are tighter than previously thought. The following example demon- 
strates the tightness of the bounds of Theorem [2] for these purposes. 

Example 7 In [3J , /^giL is estimated approximately. Using Theorem [2] with 
iV = 2, we get that in fact, 

2-65.0000000003359036150250796039103 ^ ^2^2^ ^ 2-65.0000000003359036150250796039042 

With = 3 we get that /^giL lies between 

2-65.000000000335903615025079603904203942942489665995829764250752 
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and 

2-65.000000000335903615025079603904203942942489665995829764250713 

The remarkable tightness of these bounds is due to the fact that 2^^ is much 
smaller than 2^^^. 

Another application of our results is for estimations of the quality of approx- 
imations such as ~ n^/k\ (when k << n): 

Thus the quality of this approximation is directly related to the quality of the 
approximation vr^ 1, which is well understood via Theorem [2l 

appears in many other natural contexts. For example, assume that a func- 
tion / : {0, . . . , n — 1} — >• {0, . . . , n — 1} is chosen with uniform probability 
from the set of all such functions, and fix an element x G {0, . . . , n — 1}. Then 
we have the following immediate observation. 

Fact 9 The probability that the orbit of x under f has size exactly k is tt^- K 
The probability that the size of the orbit of x is larger than k is simply n^. 

These probabilities play an important role in the theory of iterative pseudo- 
random number generation (see |l10J for a typical example). 



4 Final remarks and acknowledgments 

For a nice account of power sums see [7] . An accessible presentation and proof 
of Faulhaber's formula appears in [6]. The author thanks John H. Conway for 
the nice introduction to Bernoulli numbers, and Ron Adin for reading this 
note and detecting some typos. 
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